NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates

https://doi.org/10.5802/ojmo.32

Roulet, Vincent; Srinivasa, Siddhartha; Fazel, Maryam; Harchaoui, Zaid (November 2024, Open Journal of Mathematical Optimization)

Full Text Available
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Welleck, Sean; Bertsch, Amanda; Finlayson, Matthew; Schoelkopf, Hailey; Xie, Alex; Neubig, Graham; Kulikov, Ilia; Harchaoui, Zaid (November 2024, https://doi.org/10.48550/arXiv.2406.16838 Focus to learn more)

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.
more » « less
Full Text Available
Distributionally Robust Optimization with Bias and Variance Reduction

Mehta, Ronak; Roulet, Vincent; Pillutla, Krishna; Harchaoui, Zaid (May 2024, OpenReview)
OpenReview (Ed.)
We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and f-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-k loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparameter, and prove that it enjoys linear convergence for smooth regularized losses. This contrasts with previous algorithms that either require tuning multiple hyperparameters or potentially fail to converge due to biased gradient estimates or inadequate regularization. Empirically, we show that Prospect can converge 2-3× faster than baselines such as stochastic gradient and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains.
more » « less
Full Text Available
Flat minima generalize for low-rank matrix recovery

https://doi.org/10.1093/imaiai/iaae009

Ding, Lijun; Drusvyatskiy, Dmitriy; Fazel, Maryam; Harchaoui, Zaid (April 2024, Information and Inference: A Journal of the IMA)

Abstract Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima—those around which the loss grows slowly—appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyse overparameterized matrix and bilinear sensing, robust principal component analysis, covariance matrix estimation and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well. We complete the paper with synthetic experiments that illustrate our findings.
more » « less
Full Text Available
Stochastic Optimization under Distributional Drift

Cutler, Joshua; Drusvyatskiy, Dmitriy; Harchaoui, Zaid (October 2023, Journal of Machine Learning Research)

We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. The efficiency estimates we obtain clearly decouple the contributions of optimization error, gradient noise, and time drift. Notably, we identify a low drift-to-noise regime in which the tracking efficiency of the proximal stochastic gradient method benefits significantly from a step decay schedule. Numerical experiments illustrate our results.
more » « less
Full Text Available
Influence Diagnostics under Self-concordance

Fisher, Jillian; Liu, Lang; Pillutla, Krishna; Choi, Yejin; Harchaoui, Zaid (April 2023, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics)
Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data.
more » « less
Full Text Available
Stochastic Optimization for Spectral Risk Measures

Mehta, Ronak; Roulet, Vincent; Pillutla, Krishna; Liu, Lang; Harchaoui, Zaid (April 2023, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics)
Ruiz, Francisco; Dy, Jennifer; an de Meent, Jan-Willem (Ed.)
Spectral risk objectives – also called L-risks – allow for learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task. We develop LSVRG, a stochastic algorithm to optimize these quantities by characterizing their subdifferential and addressing challenges such as biasedness of subgradient estimates and non-smoothness of the objective. We show theoretically and experimentally that out-of-the-box approaches such as stochastic subgradient and dual averaging can be hindered by bias, whereas our approach exhibits linear convergence.
more » « less
Full Text Available
Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

Irons, Nicholas J.; Scetbon, Meyer; Pal, Soumik; Harchaoui, Zaid (January 2022, Proceedings of Machine Learning Research)

Triangular flows, also known as Knöthe-Rosenblatt measure couplings, comprise an important building block of normalizing flow models for generative modeling and density estimation, including popular autoregressive flows such as real-valued non-volume preserving transformation models (Real NVP). We present statistical guarantees and sample complexity bounds for triangular flow statistical models. In particular, we establish the statistical consistency and the finite sample convergence rates of the minimum Kullback-Leibler divergence statistical estimator of the Knöthe-Rosenblatt measure coupling using tools from empirical process theory. Our results highlight the anisotropic geometry of function classes at play in triangular flows, shed light on optimal coordinate ordering, and lead to statistical guarantees for Jacobian flows. We conduct numerical experiments to illustrate the practical implications of our theoretical findings.
more » « less
Full Text Available
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

Pillutla, Krishna; Swayamdipta, Swabha; Zellers, Rowan; Thickstun, John; Welleck, Sean; hoi, Yejin; Harchaoui, Zaid (January 2022, Advances in neural information processing systems)

As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.
more » « less
Full Text Available
A Superquantile Approach to Federated Learning with Heterogeneous Devices

https://doi.org/10.1109/CISS50987.2021.9400318

Laguel, Yassine; Pillutla, Krishna; Malick, Jerome; Harchaoui, Zaid (March 2021, IEEE Xplore)

Full Text Available

« Prev Next »

Search for: All records